Quantization of Continuous Input Variables for Binary Classification

نویسندگان

  • Michal Skubacz
  • Jaakko Hollmén
چکیده

Quantization of continuous variables is important in data analysis, especially for some model classes such as Bayesian networks and decision trees, which use discrete variables. Often, the discretization is based on the distribution of the input variables only whereas additional information, for example in form of class membership is frequently present and could be used to improve the quality of the results. In this paper, quantization methods based on equal width interval, maximum entropy, maximum mutual information and the novel approach based on maximum mutual information combined with entropy are considered. The two former approaches do not take the class membership into account whereas the two latter approaches do. The relative merits of each method are compared in an empirical setting, where results are shown for two data sets in a direct marketing problem, and the quality of quantization is measured by mutual information and the performance of Naive Bayes and C5 decision tree classifiers.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

FORCED WATER MAIN DESIGN MIXED ANT COLONY OPTIMIZATION

Most real world engineering design problems, such as cross-country water mains, include combinations of continuous, discrete, and binary value decision variables. Very often, the binary decision variables associate with the presence and/or absence of some nominated alternatives or project’s components. This study extends an existing continuous Ant Colony Optimization (ACO) algorithm to simultan...

متن کامل

Detection and Classification of Heart Premature Contractions via α-Level Binary Neyman-Pearson Radius Test: A Comparative Study

The aim of this study is to introduce a new methodology for isolation of ectopic rhythms of ambulatory electrocardiogram (ECG) holter data via appropriate statistical analyses imposing reasonable computational burden. First, the events of the ECG signal are detected and delineated using a robust wavelet-based algorithm. Then, using Binary Neyman-Pearson Radius test, an appropriate classifie...

متن کامل

Time-Mode Signal Quantization for Use in Sigma-Delta Modulators

The rapid scaling in modern CMOS technology has motivated the researchers to design new analog-to-digital converter (ADC) architectures that can properly work in lower supply voltage. An exchanging the data quantization procedure from the amplitude to the time domain, can be a promising alternative well adapt with the technology scaling. This paper is going to review the recent development in t...

متن کامل

Bearing fault diagnosis using CWT , BGA and Artificial Bee Colony Algorithm

Health diagnosis of bearing is essential reduce the breakdowns of rotating machinery. An intelligent method to diagnose the bearing fault using vibration signal is proposed. This paper proposes a binary genetic algorithm (BGA) in feature selection process and discuss about the role of fitness functions in feature selection process by application of different fitness functions in GA process. A v...

متن کامل

A Novel Noise-Robust Texture Classification Method Using Joint Multiscale LBP

In this paper we describe a novel noise-robust texture classification method using joint multiscale local binary pattern. The first step in texture classification is to describe the texture by extracting different features. So far, several methods have been developed for this topic, one of the most popular ones is Local Binary Pattern (LBP) method and its variants such as Completed Local Binary...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000